|
Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. It is a method of educational assessment and an application of natural language processing. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades—for example, the numbers 1 to 6. Therefore, it can be considered a problem of statistical classification. Several factors have contributed to a growing interest in AES. Among them are cost, accountability, standards, and technology. Rising education costs have led to pressure to hold the educational system accountable for results by imposing standards. The advance of information technology promises to measure educational achievement at reduced cost. The use of AES for high-stakes testing in education has generated significant backlash, with opponents pointing to research that computers cannot yet grade writing accurately and arguing that their use for such purposes promotes teaching writing in reductive ways (i.e. teaching to the test). ==History== Most historical summaries of AES trace the origins of the field to the work of Ellis Batten Page.〔Page, E.B. (2003). "Project Essay Grade: PEG", p. 43. In: ''Automated Essay Scoring: A Cross-Disciplinary Perspective''. Shermis, Mark D., and Jill Burstein, eds. Lawrence Erlbaum Associates, Mahwah, New Jersey, ISBN 0805839739〕〔Larkey, Leah S., and W. Bruce Croft (2003). "A Text Categorization Approach to Automated Essay Grading", p. 55. In: ''Automated Essay Scoring: A Cross-Disciplinary Perspective''. Shermis, Mark D., and Jill Burstein, eds. Lawrence Erlbaum Associates, Mahwah, New Jersey, ISBN 0805839739〕〔Keith, Timothy Z. (2003). "Validity of Automated Essay Scoring Systems", p. 153. In: ''Automated Essay Scoring: A Cross-Disciplinary Perspective''. Shermis, Mark D., and Jill Burstein, eds. Lawrence Erlbaum Associates, Mahwah, New Jersey, ISBN 0805839739〕〔Shermis, Mark D., Jill Burstein, and Claudia Leacock (2006). "Applications of Computers in Assessment and Analysis of Writing", p. 403. In: ''Handbook of Writing Research''. MacArthur, Charles A., Steve Graham, and Jill Fitzgerald, eds. Guilford Press, New York, ISBN 1-59385-190-1〕〔Attali, Yigal, Brent Bridgeman, and Catherine Trapani (2010). "Performance of a Generic Approach in Automated Essay Scoring", p. 4. Journal of Technology, Learning, and Assessment, 10(3)〕〔Wang, Jinhao, and Michelle Stallone Brown (2007). "Automated Essay Scoring Versus Human Scoring: A Comparative Study", p. 6. Journal of Technology, Learning, and Assessment, 6(2)〕〔Bennett, Randy Elliot, and Anat Ben-Simon (2005). (Toward Theoretically Meaningful Automated Essay Scoring ), p. 6. Retrieved 2012-03-19.〕 In 1966, he argued 〔Page, E.B. (1966). "The imminence of grading essays by computers". Phi Delta Kappan, 47, 238-243.〕 for the possibility of scoring essays by computer, and in 1968 he published〔Page, E.B. (1968). "The Use of the Computer in Analyzing Student Essays". International Review of Education, 14(3), 253-263.〕 his successful work with a program called Project Essay Grade™ (PEG™). Using the technology of that time, computerized essay scoring would not have been cost-effective,〔Page, E.B. (2003), pp. 44-45.〕 so Page abated his efforts for about two decades. By 1990, desktop computers had become so powerful and so widespread that AES was a practical possibility. As early as 1982, a UNIX program called Writer's Workbench was able to offer punctuation, spelling, and grammar advice.〔MacDonald, N.H., L.T. Frase, P.S. Gingrich, and S.A. Keenan (1982). "The Writers Workbench: Computer Aids for Text Analysis". IEEE Transactions on Communications, 3(1), 105-110.〕 In collaboration with several companies (notably Educational Testing Service), Page updated PEG and ran some successful trials in the early 1990s.〔Page, E.B. (1994). "New Computer Grading of Student Prose, Using Modern Concepts and Software". Journal of Experimental Education, 62(2), 127-142.〕 Peter Foltz and Thomas Landauer developed a system using a scoring engine called the Intelligent Essay Assessor™ (IEA). IEA was first used to score essays in 1997 for their undergraduate courses.〔Rudner, Lawrence. "(Three prominent writing assessment programs )". Retrieved 2012-03-06.〕 It is now a product from Pearson Educational Technologies and used for scoring within a number of commercial products and state and national exams. IntelliMetric® is Vantage Learning's AES engine. Its development began in 1996.〔Elliot, Scott (2003). "Intellimetric TM: From Here to Validity", p. 75. In: ''Automated Essay Scoring: A Cross-Disciplinary Perspective''. Shermis, Mark D., and Jill Burstein, eds. Lawrence Erlbaum Associates, Mahwah, New Jersey, ISBN 0805839739〕 It was first used commercially to score essays in 1998.〔"(IntelliMetric®: How it Works )". Retrieved 2012-02-28.〕 Educational Testing Service offers e-rater®, an automated essay scoring program. It was first used commercially in February 1999.〔Burstein, Jill (2003). "The E-rater(R) Scoring Engine: Automated Essay Scoring with Natural Language Processing", p. 113. In: ''Automated Essay Scoring: A Cross-Disciplinary Perspective''. Shermis, Mark D., and Jill Burstein, eds. Lawrence Erlbaum Associates, Mahwah, New Jersey, ISBN 0805839739〕 Jill Burstein was the team leader in its development. ETS's CriterionSM Online Writing Evaluation Service uses the e-rater engine to provide both scores and targeted feedback. Lawrence Rudner has done some work with Bayesian scoring, and developed a system called BETSY (Bayesian Essay Test Scoring sYstem).〔Rudner, Lawrence (ca. 2002). "(Computer Grading using Bayesian Networks-Overview )". Retrieved 2012-03-07.〕 Some of his results have been published in print or online, but no commercial system incorporates BETSY as yet. Under the leadership of Howard Mitzel and Sue Lottridge, Pacific Metrics developed a constructed response automated scoring engine, CRASE®. Currently utilized by several state departments of education and in a U.S. Department of Education-funded Enhanced Assessment Grant, Pacific Metrics’ technology has been used in large-scale formative and summative assessment environments since 2007. Measurement Inc. acquired the rights to PEG in 2002 and has continued to develop it.〔"(Assessment Technologies )", Measurement Incorporated. Retrieved 2012-03-09.〕 In 2012, the Hewlett Foundation sponsored a competition on Kaggle called the Automated Student Assessment Prize (ASAP).〔"(Hewlett prize )". Retrieved 2012-03-05.〕 201 challenge participants attempted to predict, using AES, the scores that human raters would give to thousands of essays written to eight different prompts. The intent was to demonstrate that AES can be as reliable as human raters, or more so. This competition also hosted a separate demonstration among 9 AES vendors on a subset of the ASAP data. Although the investigators reported that the automated essay scoring was as reliable as human scoring,〔Shermis, Mark D., and Jill Burstein, eds. Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge, 2013.〕 this claim was not substantiated by any statistical tests because some of the vendors required that no such tests be performed as a precondition for their participation. Moreover, the claim that the Hewlett Study demonstrated that AES can be as reliable as human raters has since been strongly contested,〔Perelman, L. (2014). "When 'the state of the art is counting words', Assessing Writing, 21, 104-111.〕 including by Randy E. Bennett, the Norman O. Frederiksen Chair in Assessment Innovation at the Educational Testing Service. Some of the major criticisms of the study have been that five of the eight datasets consisted of paragraphs rather than essays, four of the eight data sets were graded by human readers for content only rather than for writing ability, and that rather than measuring human readers and the AES machines against the "true score," the average of the two readers' scores, the study employed an artificial construct, the "resolved score," which in four datasets consisted of the higher of the two human scores if there was a disagreement. This last practice, in particular, gave the machines an unfair advantage by allowing them to round up for these datasets.〔 The two multi-state consortia funded by the U.S. Department of Education to develop next-generation assessments, the Partnership for Assessment of Readiness for College and Careers (PARCC), and Smarter Balanced Assessment Consortium, are committed to the challenge of transitioning from paper-and-pencil to computer-based testing by the 2014-2015 school year. As state agencies implement the Common Core State Standards, they are making decisions about the next generation assessments and how to accurately measure the new level of rigor. Innovative automated scoring software that can faithfully replicate how trained educators evaluate a student’s written response offers a new approach for states to meet the challenge. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Automated essay scoring」の詳細全文を読む スポンサード リンク
|